Search CORE

33 research outputs found

On the capacity of information processing systems

Author: Massoulie Laurent
Xu Kuang
Publication venue
Publication date: 30/05/2016
Field of study

We propose and analyze a family of information processing systems, where a finite set of experts or servers are employed to extract information about a stream of incoming jobs. Each job is associated with a hidden label drawn from some prior distribution. An inspection by an expert produces a noisy outcome that depends both on the job's hidden label and the type of the expert, and occupies the expert for a finite time duration. A decision maker's task is to dynamically assign inspections so that the resulting outcomes can be used to accurately recover the labels of all jobs, while keeping the system stable. Among our chief motivations are applications in crowd-sourcing, diagnostics, and experiment designs, where one wishes to efficiently learn the nature of a large number of items, using a finite pool of computational resources or human agents. We focus on the capacity of such an information processing system. Given a level of accuracy guarantee, we ask how many experts are needed in order to stabilize the system, and through what inspection architecture. Our main result provides an adaptive inspection policy that is asymptotically optimal in the following sense: the ratio between the required number of experts under our policy and the theoretical optimal converges to one, as the probability of error in label recovery tends to zero

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Gossiping with Multiple Messages

Author: Hajek Bruce
Massoulie Laurent
Sanghavi Sujay
Publication venue
Publication date: 22/12/2006
Field of study

This paper investigates the dissemination of multiple pieces of information in large networks where users contact each other in a random uncoordinated manner, and users upload one piece per unit time. The underlying motivation is the design and analysis of piece selection protocols for peer-to-peer networks which disseminate files by dividing them into pieces. We first investigate one-sided protocols, where piece selection is based on the states of either the transmitter or the receiver. We show that any such protocol relying only on pushes, or alternatively only on pulls, is inefficient in disseminating all pieces to all users. We propose a hybrid one-sided piece selection protocol -- INTERLEAVE -- and show that by using both pushes and pulls it disseminates

k

pieces from a single source to

n

users in

10(k+\log n)

time, while obeying the constraint that each user can upload at most one piece in one unit of time, with high probability for large

n

. An optimal, unrealistic centralized protocol would take

k+\log_2 n

time in this setting. Moreover, efficient dissemination is also possible if the source implements forward erasure coding, and users push the latest-released coded pieces (but do not pull). We also investigate two-sided protocols where piece selection is based on the states of both the transmitter and the receiver. We show that it is possible to disseminate

n

pieces to

n

users in

n+O(\log n)

time, starting from an initial state where each user has a unique piece.Comment: Accepted to IEEE INFOCOM 200

arXiv.org e-Print Archive

CiteSeerX

An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums

Author: Bach Francis
Hendrikx Hadrien
Massoulie Laurent
Publication venue
Publication date: 12/06/2019
Field of study

Modern large-scale finite-sum optimization relies on two key aspects: distribution and stochastic updates. For smooth and strongly convex problems, existing decentralized algorithms are slower than modern accelerated variance-reduced stochastic algorithms when run on a single machine, and are therefore not efficient. Centralized algorithms are fast, but their scaling is limited by global aggregation steps that result in communication bottlenecks. In this work, we propose an efficient \textbf{A}ccelerated \textbf{D}ecentralized stochastic algorithm for \textbf{F}inite \textbf{S}ums named ADFS, which uses local stochastic proximal updates and randomized pairwise communications between nodes. On

n

machines, ADFS learns from

nm

samples in the same time it takes optimal algorithms to learn from

m

samples on one machine. This scaling holds until a critical network size is reached, which depends on communication delays, on the number of samples

m

, and on the network topology. We provide a theoretical analysis based on a novel augmented graph approach combined with a precise evaluation of synchronization times and an extension of the accelerated proximal coordinate gradient algorithm to arbitrary sampling. We illustrate the improvement of ADFS over state-of-the-art decentralized approaches with experiments.Comment: Code available in source files. arXiv admin note: substantial text overlap with arXiv:1901.0986

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Adaptive Matching for Expert Systems with Uncertain Task Types

Author: Gulikers Lennart
Massoulie Laurent
Shah Virag
Vojnovic Milan
Publication venue
Publication date: 03/10/2017
Field of study

A matching in a two-sided market often incurs an externality: a matched resource may become unavailable to the other side of the market, at least for a while. This is especially an issue in online platforms involving human experts as the expert resources are often scarce. The efficient utilization of experts in these platforms is made challenging by the fact that the information available about the parties involved is usually limited. To address this challenge, we develop a model of a task-expert matching system where a task is matched to an expert using not only the prior information about the task but also the feedback obtained from the past matches. In our model the tasks arrive online while the experts are fixed and constrained by a finite service capacity. For this model, we characterize the maximum task resolution throughput a platform can achieve. We show that the natural greedy approaches where each expert is assigned a task most suitable to her skill is suboptimal, as it does not internalize the above externality. We develop a throughput optimal backpressure algorithm which does so by accounting for the `congestion' among different task types. Finally, we validate our model and confirm our theoretical findings with data-driven simulations via logs of Math.StackExchange, a StackOverflow forum dedicated to mathematics.Comment: A part of it presented at Allerton Conference 2017, 18 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Group Synchronization on Grids

Author: Abbe Emmanuel
Massoulie Laurent
Montanari Andrea
Sly Allan
Srivastava Nikhil
Publication venue
Publication date: 26/06/2017
Field of study

Group synchronization requires to estimate unknown elements

({\theta}_v)_{v\in V}

of a compact group

{\mathfrak G}

associated to the vertices of a graph

G=(V,E)

, using noisy observations of the group differences associated to the edges. This model is relevant to a variety of applications ranging from structure from motion in computer vision to graph localization and positioning, to certain families of community detection problems. We focus on the case in which the graph

G

is the

d

-dimensional grid. Since the unknowns

{\boldsymbol \theta}_v

are only determined up to a global action of the group, we consider the following weak recovery question. Can we determine the group difference

{\theta}_u^{-1}{\theta}_v

between far apart vertices

u, v

better than by random guessing? We prove that weak recovery is possible (provided the noise is small enough) for

d\ge 3

and, for certain finite groups, for

d\ge 2

. Viceversa, for some continuous groups, we prove that weak recovery is impossible for

d=2

. Finally, for strong enough noise, weak recovery is always impossible.Comment: 21 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Stability of non markovian polling systems

Author: Massoulie Laurent
Publication venue: HAL CCSD
Publication date: 01/11/1993
Field of study

In this article we consider polling systems with markovian server routings and where each station is attended according to a specific policy. A stationary regime for this system is constructed under general statistical assumptions (stationarity, ergodicity) on the input processes to the stations (in particular it is not required that these processes be mutually independent). The method of construction is as follows : one constructs recursively a sequence of stationary regimes for fictive systems that approximate in some sense the original polling system, the stationary regime is then identified as the limit of this sequence of stationary processes. The main tools for these results are Palm calculus and Birkhoff's ergodictheorem. It is shown by a coupling argument that this stationary regime is minimal in the stochastic ordering sense. The assumptions on the service policies allow to consider the purely gated policy, the a-limited policy, the binomial-gated policy and others. As a by-product sufficient conditions for the stationary regime of a G/G/1/0 queue with multiple server vacations are obtained

INRIA a CCSD electronic archive server

Recommended from our members

Faithfulness in Internet Algorithms

Author: Massoulie Laurent
Parkes David C.
Shneidman Jeffrey
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/05/2010
Field of study

Proving or disproving faithfulness (a property describing robustness to rational manipulation in action as well as information revelation) is an appealing goal when reasoning about distributed systems containing rational participants. Recent work formalizes the notion of faithfulness and its foundation properties, and presents a general proof technique in the course of proving the ex post Nash faithfulness of a theoretical routing problem [11].In this paper, we use a less formal approach and take some first steps in faithfulness analysis for existing algorithms running on the Internet. To this end, we consider the expected faithfulness of BitTorrent, a popular file download system, and show how manual backtracing (similar to the the ideas behind program slicing [22]) can be used to find rational manipulation problems. Although this primitive technique has serious drawbacks, it can be useful in disproving faithfulness.Building provably faithful Internet protocols and their corresponding specifications can be quite difficult depending on the system knowledge assumptions and problem complexity. We present some of the open problems that are associated with these challenges.Engineering and Applied Science

Harvard University - DASH

Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization

Author: Bach Francis
Bubeck Sebastien
Hendrikx Hadrien
Massoulie Laurent
Xiao Lin
Publication venue
Publication date: 25/02/2020
Field of study

We consider the setting of distributed empirical risk minimization where multiple machines compute the gradients in parallel and a centralized server updates the model parameters. In order to reduce the number of communications required to reach a given accuracy, we propose a \emph{preconditioned} accelerated gradient method where the preconditioning is done by solving a local optimization problem over a subsampled dataset at the server. The convergence rate of the method depends on the square root of the relative condition number between the global and local loss functions. We estimate the relative condition number for linear prediction models by studying \emph{uniform} concentration of the Hessians over a bounded domain, which allows us to derive improved convergence rates for existing preconditioned gradient methods and our accelerated method. Experiments on real-world datasets illustrate the benefits of acceleration in the ill-conditioned regime

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server